Tage #443

ABenC377 · 2024-12-09T11:35:44Z

Adding a TAGE branch predictor.

Performance relative to previous best (Perceptron) summarised below:

Benchmark	BP_update time	BP_update mispredict	TAGE time	TAGE mispredict	Performance change (percentage)	Mispredict change (raw)
CloverLeaf serial gcc8.3.0 armv8.4	11715	10.70%	10511	2.06%	-10.28% ✅	-8.64% ✅
CloverLeaf serial gcc9.3.0 armv8.4	11830	10.40%	10321	1.85%	-12.76% ✅	-8.55% ✅
CloverLeaf serial gcc10.3.0 armv8.4	12457	12.70%	10550	1.74%	-15.31% ✅	-10.96% ✅
CloverLeaf serial armclang20 armv8.4	11041	14.50%	9069	1.85%	-17.86% ✅	-12.65% ✅
CloverLeaf openmp gcc8.3.0 armv8.4	15635	9.78%	14530	2.00%	-7.07% ✅	-7.78% ✅
CloverLeaf openmp gcc9.3.0 armv8.4	15559	7.82%	14680	1.64%	-5.65% ✅	-6.18% ✅
CloverLeaf openmp gcc10.3.0 armv8.4	15291	9.33%	13872	1.56%	-9.28% ✅	-7.77% ✅
CloverLeaf openmp armclang20 armv8.4	13939	13.30%	12273	1.74%	-11.95% ✅	-11.56% ✅
miniBUDE openmp gcc8.3.0 armv8.4	21292	7.37%	20613	5.71%	-3.19% ✅	-1.66% ✅
miniBUDE openmp gcc9.3.0 armv8.4	20652	7.43%	20364	5.71%	-1.39% ✅	-1.72% ✅
miniBUDE openmp gcc10.3.0 armv8.4	20810	7.41%	20691	5.71%	-0.57% ✅	-1.70% ✅
miniBUDE openmp armclang20 armv8.4	19365	10.30%	19808	5.05%	+2.29% ❌	-5.25% ✅
STREAM serial gcc8.3.0 armv8.4	6619	0.51%	6672	0.42%	+0.80% ❌	-0.09% ✅
STREAM serial gcc9.3.0 armv8.4	6612	0.50%	6778	0.42%	+2.51% ❌	-0.08% ✅
STREAM serial gcc10.3.0 armv8.4	6583	0.44%	6649	0.42%	+1.00% ❌	-0.03% ✅
STREAM serial armclang20 armv8.4	7018	1.01%	7071	0.75%	+0.76% ❌	-0.26% ✅
STREAM openmp gcc8.3.0 armv8.4	10200	1.90%	10066	0.67%	-1.31% ✅	-1.23% ✅
STREAM openmp gcc9.3.0 armv8.4	10037	2.08%	9859	0.50%	-1.77% ✅	-1.58% ✅
STREAM openmp gcc10.3.0 armv8.4	9814	1.81%	9799	0.63%	-0.15% ✅	-1.18% ✅
STREAM openmp armclang20 armv8.4	10292	3.04%	10459	1.21%	+1.62% ❌	-1.83% ✅
TeaLeaf 2D serial gcc8.3.0 armv8.4	11874	15.20%	10871	1.04%	-8.45% ✅	-14.16% ✅
TeaLeaf 2D serial gcc9.3.0 armv8.4	11846	15.20%	10847	1.05%	-8.43% ✅	-14.15% ✅
TeaLeaf 2D serial gcc10.3.0 armv8.4	12020	15.20%	11078	1.05%	-7.84% ✅	-14.15% ✅
TeaLeaf 2D serial armclang20 armv8.4	21353	9.16%	20597	3.27%	-3.54% ✅	-5.89% ✅
TeaLeaf 2D openmp gcc8.3.0 armv8.4	17221	7.05%	16118	1.18%	-6.40% ✅	-5.87% ✅
TeaLeaf 2D openmp gcc9.3.0 armv8.4	17224	7.75%	16485	1.43%	-4.29% ✅	-6.32% ✅
TeaLeaf 2D openmp gcc10.3.0 armv8.4	16595	6.70%	16221	0.85%	-2.25% ✅	-5.85% ✅
TeaLeaf 2D openmp armclang20 armv8.4	52356	9.29%	50683	2.38%	-3.20% ✅	-6.91% ✅
TeaLeaf 3D serial gcc8.3.0 armv8.4	13645	8.73%	13203	1.15%	-3.24% ✅	-7.58% ✅
TeaLeaf 3D serial gcc9.3.0 armv8.4	14157	10.70%	13463	1.57%	-4.90% ✅	-9.13% ✅
TeaLeaf 3D serial gcc10.3.0 armv8.4	14331	11.00%	13167	1.50%	-8.12% ✅	-9.50% ✅
TeaLeaf 3D serial armclang20 armv8.4	19199	22.40%	16675	1.69%	-13.15% ✅	-20.71% ✅
TeaLeaf 3D openmp gcc8.3.0 armv8.4	22251	7.62%	21775	2.30%	-2.14% ✅	-5.32% ✅
TeaLeaf 3D openmp gcc9.3.0 armv8.4	22774	9.03%	21750	1.47%	-4.50% ✅	-7.56% ✅
TeaLeaf 3D openmp gcc10.3.0 armv8.4	22229	8.33%	20867	0.99%	-6.13% ✅	-7.34% ✅
TeaLeaf 3D openmp armclang20 armv8.4	40910	16.90%	37148	1.30%	-9.20% ✅	-15.60% ✅
CloverLeaf serial gcc8.3.0 armv8.4+sve	11137	10.90%	9820	2.31%	-11.83% ✅	-8.59% ✅
CloverLeaf serial gcc9.3.0 armv8.4+sve	11051	9.98%	9909	1.86%	-10.33% ✅	-8.12% ✅
CloverLeaf serial gcc10.3.0 armv8.4+sve	11140	12.80%	9462	1.80%	-15.06% ✅	-11.00% ✅
CloverLeaf serial armclang20 armv8.4+sve	11051	13.30%	9280	1.50%	-16.03% ✅	-11.80% ✅
CloverLeaf openmp gcc8.3.0 armv8.4+sve	14845	9.65%	13076	1.87%	-11.92% ✅	-7.78% ✅
CloverLeaf openmp gcc9.3.0 armv8.4+sve	15310	7.96%	13208	1.80%	-13.73% ✅	-6.16% ✅
CloverLeaf openmp gcc10.3.0 armv8.4+sve	14754	9.62%	13397	1.56%	-9.20% ✅	-8.06% ✅
CloverLeaf openmp armclang20 armv8.4+sve	14309	12.20%	12493	1.29%	-12.69% ✅	-10.91% ✅
miniBUDE openmp gcc8.3.0 armv8.4+sve	8707	14.20%	7917	3.61%	-9.07% ✅	-10.59% ✅
miniBUDE openmp gcc9.3.0 armv8.4+sve	8440	7.43%	7656	3.54%	-9.29% ✅	-3.89% ✅
miniBUDE openmp gcc10.3.0 armv8.4+sve	8458	7.41%	7695	3.41%	-9.02% ✅	-4.00% ✅
miniBUDE openmp armclang20 armv8.4+sve	8655	20.30%	7961	1.21%	-8.02% ✅	-19.09% ✅
STREAM serial gcc8.3.0 armv8.4+sve	3521	1.51%	3572	1.28%	+1.45% ❌	-0.23% ✅
STREAM serial gcc9.3.0 armv8.4+sve	3534	1.50%	3980	1.28%	+12.62% ❌	-0.22% ✅
STREAM serial gcc10.3.0 armv8.4+sve	3481	1.44%	3787	1.28%	+8.79% ❌	-0.16% ✅
STREAM serial armclang20 armv8.4+sve	2207	2.01%	2268	1.46%	+2.76% ❌	-0.55% ✅
STREAM openmp gcc8.3.0 armv8.4+sve	6821	3.90%	6740	1.79%	-1.19% ✅	-2.11% ✅
STREAM openmp gcc9.3.0 armv8.4+sve	7198	3.08%	6686	1.01%	-7.11% ✅	-2.07% ✅
STREAM openmp gcc10.3.0 armv8.4+sve	7067	2.81%	6646	1.18%	-5.96% ✅	-1.63% ✅
STREAM openmp armclang20 armv8.4+sve	6428	3.04%	5867	1.44%	-8.73% ✅	-1.60% ✅
TeaLeaf 2D serial gcc8.3.0 armv8.4+sve	11725	15.20%	10942	1.04%	-6.68% ✅	-14.16% ✅
TeaLeaf 2D serial gcc9.3.0 armv8.4+sve	11584	15.20%	10899	1.04%	-5.91% ✅	-14.16% ✅
TeaLeaf 2D serial gcc10.3.0 armv8.4+sve	11879	15.20%	11034	1.05%	-7.11% ✅	-14.15% ✅
TeaLeaf 2D serial armclang20 armv8.4+sve	11137	9.16%	7370	0.87%	-33.82% ✅	-8.29% ✅
TeaLeaf 2D openmp gcc8.3.0 armv8.4+sve	17041	7.05%	16349	1.18%	-4.06% ✅	-5.87% ✅
TeaLeaf 2D openmp gcc9.3.0 armv8.4+sve	17451	7.75%	16532	1.43%	-5.27% ✅	-6.32% ✅
TeaLeaf 2D openmp gcc10.3.0 armv8.4+sve	16463	6.70%	16187	0.85%	-1.68% ✅	-5.85% ✅
TeaLeaf 2D openmp armclang20 armv8.4+sve	52701	9.29%	49203	1.63%	-6.64% ✅	-7.66% ✅
TeaLeaf 3D serial gcc8.3.0 armv8.4+sve	12169	18.73%	10629	1.60%	-12.66% ✅	-17.13% ✅
TeaLeaf 3D serial gcc9.3.0 armv8.4+sve	12183	18.70%	10639	1.07%	-12.67% ✅	-17.63% ✅
TeaLeaf 3D serial gcc10.3.0 armv8.4+sve	12405	18.30%	10616	1.58%	-14.42% ✅	-16.72% ✅
TeaLeaf 3D serial armclang20 armv8.4+sve	19363	22.40%	15654	1.42%	-19.16% ✅	-20.98% ✅
TeaLeaf 3D openmp gcc8.3.0 armv8.4+sve	21676	7.62%	18948	2.22%	-12.59% ✅	-5.40% ✅
TeaLeaf 3D openmp gcc9.3.0 armv8.4+sve	20728	9.03%	18989	1.51%	-8.39% ✅	-7.52% ✅
TeaLeaf 3D openmp gcc10.3.0 armv8.4+sve	20438	8.33%	18652	0.85%	-8.74% ✅	-7.48% ✅
TeaLeaf 3D openmp armclang20 armv8.4+sve	41040	8.49%	37791	0.88%	-7.92% ✅	-7.61% ✅

Rebasing

…calls

…er.hh rebasing

FinnWilkinson

Generally good, a few minor points

docs/sphinx/user/configuring_simeng.rst

src/include/simeng/branchpredictors/BranchHistory.hh

src/include/simeng/branchpredictors/TagePredictor.hh

jj16791 · 2024-12-14T11:50:07Z

docs/sphinx/user/configuring_simeng.rst

@@ -149,13 +149,13 @@ The Branch-Prediction section contains those options to parameterise the branch
 The current options include:

 Type
-    The type of branch predictor that is used, the options are ``Generic``, and ``Perceptron``.  Both types of predictor use a branch target buffer with each entry containing a direction prediction mechanism and a target address.  The direction predictor used in ``Generic`` is a saturating counter, and in ``Perceptron`` it is a perceptron.
+    The type of branch predictor that is used, the options are ``Generic``, ``Perceptron``, and ``Tage``.  Each of these types of predictor use prediction tables with each entry containing a direction prediction mechanism and a target address.  The direction predictor used in ``Generic`` and ``TAGE`` is a saturating counter, and in ``Perceptron`` it is a perceptron.  ``TAGE`` also uses a series of further, tagged prediction tables to provide predictions informed by greater branch histories.


is there a good reason behind using Tage and TAGE?

There is not. I've udpated to Tage throughout, as this is the capitalisation used in the config yaml.

Seems like the creator uses all forms of capitalisation

jj16791 · 2024-12-14T12:04:51Z

configs/a64fx.yaml

@@ -29,10 +29,15 @@ Queue-Sizes:
  Load: 40
  Store: 24
 Branch-Predictor:


Some TX2 diagrams note it's use of a multi-history branch predictor. I assume this is TAGE-like so maybe apply this config update to the TX2 YAML as well?

Yes, that sounds like it would be. I've updated the TX2 config as well.

jj16791 · 2024-12-14T12:06:37Z

src/lib/branchpredictors/TagePredictor.cc

+  for (uint32_t i = 0; i < numTageTables_; i++) {
+    std::vector<TageEntry> newTable;
+    for (uint32_t j = 0; j < (1ul << tageTableBits_); j++) {
+      TageEntry newEntry = {2, 0, 1, 0};


Would we not want to initialise the TageEntry with a SatCnt equal to the once used in the btb_?

Yep, good catch

jj16791 · 2024-12-14T12:16:17Z

src/lib/branchpredictors/TagePredictor.cc

+  // global history (folded onto itself to make it of the correct size).
+  uint64_t h1 = (address >> 2);
+  uint64_t h2 = globalHistory_.getFolded(1ull << (table + 1), tageTableBits_);
+  // Then truncat the XOR to make it fit thed esired size of an index


*the desired

jj16791 · 2024-12-14T12:16:41Z

src/lib/branchpredictors/TagePredictor.cc

+  // global history (folded onto itself to make it of the correct size).
+  uint64_t h1 = (address >> 2);
+  uint64_t h2 = globalHistory_.getFolded(1ull << (table + 1), tageTableBits_);
+  // Then truncat the XOR to make it fit thed esired size of an index


FinnWilkinson · 2024-12-16T14:04:00Z

Could you also add tage to the a64fx_SME.yaml config

… fixed but dynamically chosen size

….e., is now different from the first tagged table)

Merging with changes to dev

Merging with dev

dANW34V3R

Very clean PR. Nicely precisely commented and everything is very easily readable. I like the branch history class which is also well explained

dANW34V3R · 2024-12-19T18:09:14Z

src/include/simeng/branchpredictors/BranchHistory.hh

+      if (i == 0) {
+        history_[i] |= ((isTaken) ? 1 : 0);
+      } else {
+        history_[i] |= (((history_[i - 1] & (1ull << 63)) > 0) ? 1 : 0);


Does this need the conditional statement? After doing the AND you could shift right by 63 to get your 0 or 1. Would be slightly fewer cycles and more understandable/readable in my eyes (you may disagree)

I think the conditional is needed here. Whats being loaded into the uint64 depends on where it is in the vector. All but the least-significant uint64s get the MSB of the next uint64 added as the LSB. But the least-significant uint64 gets isTaken added as the LSB. However, if I'm misunderstanding your Q LMK.

dANW34V3R · 2024-12-19T18:15:20Z

src/include/simeng/branchpredictors/BranchHistory.hh

+   * outcome, 'position' would be 0.
+   * */
+  void updateHistory(bool isTaken, uint64_t position) {
+    if (position < size_) {


Should we assert position being < size_ as above, or are there cases where this could "validly" be greater? For instance, if you are trying to update an entry that has been lost from the history because there have been too many branches in the meantime?

Exactly as you say, I don't think that this should be an assert as the core may validly try to update a history that is no longer being tracked. The reason that we should allow this is to allow the pipeline not to need to know the size of the branch history. We're already ensuring that this doesn't cause problems with our if statement on 82.

dANW34V3R · 2024-12-20T10:32:27Z

src/include/simeng/branchpredictors/BranchHistory.hh

+ * access and manipulate large branch histories, as are needed in
+ * sophisticated branch predictors.
+ *
+ * The bits of the branch history are stored in a vector of uint64_t values,


"vector" should be "array"

dANW34V3R · 2024-12-20T10:48:28Z

src/include/simeng/branchpredictors/TagePredictor.hh

+  std::vector<std::pair<uint8_t, uint64_t>> btb_;
+
+  /** The bitlength of the Tagged tables' indices.
+   * Each tagged table with have 2^bits entries. */


With -> will

dANW34V3R · 2024-12-20T16:18:43Z

src/lib/branchpredictors/TagePredictor.cc

+
+uint64_t TagePredictor::getTag(uint64_t address, uint8_t table) {
+  // Hash function here is pretty arbitrary
+  uint64_t h1 = address;


Any reason not to remove the 2 LSBs here?

Yes. Ideally the hashes for the tag and the index should never each produce the same value for two different branches. Therefore, because the index does remove the 2 LSBs, keeping them here makes the information being passed into the hashes different and so improves the accuracy of the BP (reduces the risk of this type of accidental clashing).

dANW34V3R · 2024-12-20T16:33:18Z

docs/sphinx/user/configuring_simeng.rst

+    Only needed for a ``Tage`` predictor.  The number of tagged tables used by the predictor, in addition to a default prediction table (i.e., the BTB).  Therefore, a value of 3 for ``Num-Tage-Tables`` would result in four total prediction tables: one BTB and three tagged tables.  If no tagged tables are desired, it is recommended to use the ``GenericPredictor`` instead.
+
+Tage-Length
+    Only needed for a ``Tage`` predictor.  The number of bits used to tage the entries of the tagged tables.


Is the "tage" in the latter sentence meant to be that or rather "tag"

ABenC377 added 30 commits November 1, 2024 11:39

Rebasing to dev

ff665cf

Rebasing to dev

e81673f

Rebasing to dev

40e7709

Addressing superficial comments on PR

6fa281d

Clang format

31c871e

Rebasing

Adding more detail to virtual flush and update functions re order of …

110c1c6

…calls

Moving buffer branch flush functionality from core.cc to PipelineBuff…

4b3617c

…er.hh rebasing

Rebasing to dev

a55e292

Rebasing to dev

17c9baf

Rebasing to dev

508a2f4

Rebasing to dev

e0f8121

Rebasing to dev

af8d1a0

Rebasing to dev

f9089e0

Rebasing to dev

3e5b507

Rebasing to dev

0445478

Rebasing to dev

f49e538

Rebasing to dev

e688a05

Rebasing to dev

1f925ea

clang format

52f9688

Rebasing to dev

6a286d3

undoing last push

d0cc56a

Updating haeders and comments

1c1b6ce

Rebasing to dev

1b800c1

replacing = with ==

3aa7ca0

Rebasing to dev

e525016

clang format

673fe87

Rebasing to dev

ace2d59

Rebasing to dev

416cc20

Rebasing to dev

92e67a8

Rebasing to dev

f051277

ABenC377 requested review from dANW34V3R and JosephMoore25 December 9, 2024 17:29

Adding include to BranchHistory.hh

22756e6

ABenC377 marked this pull request as ready for review December 10, 2024 13:06

FinnWilkinson requested changes Dec 10, 2024

View reviewed changes

ABenC377 added 2 commits December 10, 2024 16:51

Turning around Finn's comments

eba5447

Capitalising a comment

768db53

jj16791 requested changes Dec 14, 2024

View reviewed changes

FinnWilkinson previously approved these changes Dec 16, 2024

View reviewed changes

Turning vectors for indices and tags in the ftq into shared_ptrs of a…

14789a6

… fixed but dynamically chosen size

ABenC377 dismissed FinnWilkinson’s stale review via 14789a6 December 17, 2024 14:55

ABenC377 added 7 commits December 17, 2024 14:56

predTable from uint8_t to int8_t

7dcbb16

updating how predTable is handled so that btb is -1, rather than 0 (i…

c6dc5b5

….e., is now different from the first tagged table)

Correcting tests after optimisation

f9da602

TAGE->Tage in the documentation

0ecdd6b

Adding Tage to TX2 config file

57f3575

Correcting typos in comments

674672f

Merge branch 'dev' into TAGE

800ce6f

Merging with changes to dev

ABenC377 requested review from jj16791 and FinnWilkinson December 17, 2024 15:25

ABenC377 added 3 commits December 17, 2024 15:28

Adding Tage to a64fx_SME.yaml

b23429f

Adjusting comments

ac0d2cf

Clang format

f9ecd29

FinnWilkinson previously approved these changes Dec 18, 2024

View reviewed changes

ABenC377 added 2 commits December 18, 2024 15:51

Merge branch 'dev' into TAGE

f11661d

Merging with dev

Merge branch 'dev' into TAGE

e00ec65

Merging with dev

dANW34V3R reviewed Dec 20, 2024

View reviewed changes

Updating comments and docs in response to PR comments

d87cc5d

ABenC377 dismissed FinnWilkinson’s stale review via d87cc5d December 30, 2024 13:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tage #443

Tage #443

ABenC377 commented Dec 9, 2024 •

edited

Loading

FinnWilkinson left a comment

jj16791 Dec 14, 2024

ABenC377 Dec 17, 2024

dANW34V3R Dec 18, 2024

jj16791 Dec 14, 2024

ABenC377 Dec 17, 2024

jj16791 Dec 14, 2024

ABenC377 Dec 17, 2024

jj16791 Dec 14, 2024

jj16791 Dec 14, 2024

FinnWilkinson commented Dec 16, 2024

dANW34V3R left a comment

dANW34V3R Dec 19, 2024

ABenC377 Dec 30, 2024 •

edited

Loading

dANW34V3R Dec 19, 2024

ABenC377 Dec 30, 2024

dANW34V3R Dec 20, 2024

dANW34V3R Dec 20, 2024

dANW34V3R Dec 20, 2024

ABenC377 Dec 30, 2024

dANW34V3R Dec 20, 2024

Tage #443

Are you sure you want to change the base?

Tage #443

Conversation

ABenC377 commented Dec 9, 2024 • edited Loading

FinnWilkinson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FinnWilkinson commented Dec 16, 2024

dANW34V3R left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ABenC377 Dec 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ABenC377 commented Dec 9, 2024 •

edited

Loading

ABenC377 Dec 30, 2024 •

edited

Loading